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^ Abstract 

We introduce a concept called coherence for signals and constraints in compressed sens- 
I ing. In our setting, we assume that the signal can be observed in finitely many features, and 

the set of possibly observable feature combinations forms an analytic variety which models 
the compression constraints. We study the question how many random measurements of the 
feature components suffice to identify all features. We show that the asymptotics of the suf- 
ficient number of measurements is determined by the coherence of the signal; furthermore, 
\.J if the constraints are algebraic, we show that in general the asymptotics depend only on the 

coherence of the constraints, and not on the true signal, and derive results which explain 
t/3 the form of known bounds in compressed sensing. We exemplify our approach by deriving 

, sufficient sampling densities for low-rank matrix completion and distance matrix completion 

which are independent of the true matrix. 



> 

1. Introduction 

^ 1.1. The general setting Compressed sensing can be formulated as the task of recovering a 

signal X = (x^, . . . , x„) e K", where K = M or K = C, from noisy measurements of some of its 

^ entries X;. Clearly this is impossible w^ithout some structural assumption; i.e., that x belongs 

to some subset X c K" of signals with intrinsic properties that imply relations among the X;. 

^ Well-known examples are: support on a sparse set of Fourier coefficients, the X; are entries of 

^> a low-rank matrix, or that the X; are squared distances. One can think of K" as providing a 

^ parametric, or feature representation of x, and X as determining either compression constraints 

H or signal properties. 

A central question in compressed sensing is always which fraction of the observations are suf- 
ficient to reconstruct x with suitable accuracy. A common measurement model is the Erdos-Renyi 
sampling modeQ where each noisy entry X; is observed independently with some probability p. 
Of particular interest is the asymptotic behavior. For a family of problems with n — > oo, determine 
a function /(n) such that 0(/(n)) (random) measurements suffice for reconstruction of x. Most 
known results are of this kind, and many known results in different settings interestingly take 
the form 

/(n) = c(x)-dim(X)-log^(n) (1) 



'Machine Learning Group, TU-Berlin. fremz. j .kiralyStu-berlin.de 
^Inst. Math., FU-Berlin. theranSmath.fu-berlin.de 
^the Erdos-Renyi model has, in most interesting cases, the same asymptotics as the uniform sampling model, where 
a fixed number of entries is observed uniformly 
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for some integer k, where c(x) is some function measuring a property of x, often with c(x) 
bounded independently of n, and dim(X) is the number of degrees of freedom of the signal 
space. For an overview on different compressed sensing problems and some asymptotic results, 
see for example |Donoho| ( |2006l ) ; [Candes and Romberg] ( |2007l ) or [Candes and Wakin| ( |2008l ) . 



A property of x which often plays a role in c(x) is the coherence, or incoherence of x. In 
different setting, coherence of x is differently defined — classically for wave functions and time 
series, and, more recently, also for matrices Candes and Recht ( |2009 1 — but there is a common 



principle: The more incoherent a signal is, the more the signal itself is independent from the 
sampling process; as this also concerns information about the signal, the measurement of a more 
incoherent signal will provide more information than the measurement of a more coherent one. 
Thus, in the above formula, c(x) will be smaller for incoherent x and bigger for coherent x. 

In this paper, we provide an explanation for this behavior and a general framework for deter- 
mining the measurement asymptotics of compressed sensing. Qualitatively, we will prove that 

/(n) = coh(X)-nlogn (2) 

suffices for the (local) identifiability of a general and noiseless signal, under the condition that 
the constraint X is algebraic (or close to algebraic) . Identifiability means that the set of signals is, 
in general, potentially reconstructible from the set of measurments. The quantity coh(X) which 
we will define fulfills coh(X) < 1 and is called the coherence of (the constraints given by) X. Note 
that /(n), as given above, does not depend anymore on properties of the signal x to reconstruct, 
only on properties of the constrained signal space X. 

Moreover, if X is well-behaved, i.e., incoherent with the measurement process, then one will 
have coh(X) = 0(dim(X)/n), so in this case the asymptotics of the sufficient sampling density 
becomes 

/(n) = dim(X)-logn, (3) 

which also explains the form of the first equation, now with the influence of signal properties 
c[x) removed. 

1.2. Fixed coordinates and the logarithmic term Before continuing with specific examples, 
we want to point out an important conceptual point in the setting. Equations ([2]) and {[3]) involve 
a logarithmic term. Intuitively speaking, the number of measurements to obtain reconstruction is 
log n times at least the degrees of freedom dim(X) of the signal space. It can be counterintuitive 
that the log n term is in the bound, in particular in view of the following result, which is probably 
folklore and states that dim(X) measurements of general linear projection suffice. 

Theorem 1.1. LetX c K" be an algebraic variety of dimension d, let x eX. Let £ : — > K"^ be a 
generic linear map. Ifm>d, then x is uniquely determined by the values of l{x\ and the condition 
that x^X. 



A proof is given in the appendix, see Theorem A.l In view of this theorem — ^which is a 
statement on noise-free identifiability — one might now hypothesize that /(n) = dim(X), without 
the logn term, might always be the true sufficient asymptotics when there is no noise and x is 
sampled reasonably. 

Nevertheless we claim that the bound with logarithm is the best possible one for the given 
setting, and that the logn term is not due to noise or sampling of x. Namely, lower bounds 
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including the logn term are known in the noise-free case which furthermore are independent 
of the particular sampling process of the signal, see Kiraly et al. (2012a I for the case of matrix 
completion. The reason that the logarithmic term is necessary is that the coordinates of the 
signal X are fixed, i.e., the bounds ^ and ([s]) hold for any fixed choice of coordinates, while 
Theorem 1 is a statement which holds for a generic choice of coordinates (i.e., with "random" 
coefficients) Coordinate projections. Since our coordinates are fixed and possibly degenerate, 
they are not generic in the sense of Theorem 



1.3. Examples for algebraic compressed sensing In this paper, we consider two algebraic 
compressed sensing problems - low-rank matrix completion and distance matrix completion - 
and derive sufficient reconstruction bounds by applying the coherence framework. 



Low-rank matrix completion In the low-rank matrix completion problem the signal is an un- 
known matrix A, the measurements are the entries of A. The structural assumption is that A has 
rank at most r. The set of matrices of at most rank r is a determinantal variety, for example the 
variety of (m x n) matrices X = M(m x n, r); it has dim(M(m x n, r)) = r(m + n — r) degrees of 
freedom. Results have been obtained in the case where the entries are observed with sampling 
probability p. Results of jCandes and Tao| ( [20Tol ), |Keshavan et al.| ( [20101 ) and [Kiraly et al.| ( |2012"a] ) 
implfl 

Theorem 1.2. Let r e N be a fixed constant, and let A be an (m x n) matrix with m < n that is 
incoherent. Then, there are constants c > and C > depending only on r such that, if each entry 
of A is sampled with probability p, then w.h.p^ if 

p < c ■ n~^ logn, 

then A cannot be reconstructed from the observed entries and the knowledge that A^X, and if 

p >C-X-n-\\ognf 
then A can be reconstructed from the observed entries and the knowledge that A&X. 



Due to a coupon collector's effect observed by Candes and Tao ( 2010[ ), the order of p cannot 
be improved. We call attention to the fact that the assumption of incoherence, which will play 



a key role here as well, is on the true matrix A itself Thus, if we take the view of Kiraly et al 
( 2012a| ) which treats low-rank matrix completion as a parametric estimation problem. Theorem 



1.2 



bundles together: (a) the sampling of the true matrix A; and (b) the sampling of the observed 
entries. 

Our coherence framework not only allows to generalize the statement but also to disentangle 
the sampling of the signal and the randomness in the observation process. 



Distance matrix completion In the distance matrix completion problem, the signal is a distance 

matrix (also sometimes called similarity matrix), i.e., an n x n matrix A such that = — 

if for some set of points Pi,P2>--->Pn ^ The structural assumption is that A is a distance 

^What is stated here is a specialization, since the original theorems allow the rank to grow, 
^w.h.p. = as fi — > 00, the probability of the statement which follows approaches 1 
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matrix, which is a non-trivial assumption: namely, the distance matrices make up a (rn — ('^2^))- 



dimensional algebraic set C(r, n), a subset of ("l^^)-space known as (real points of) the Cayley 



Menger Variety (see Borcea (2002)). Again, entries are measured independently with fixed prob- 
ability p. Coordinate projections of distance matrices are also known as bar-joint frameworks, 



which are central objects in rigidity theory (see, e.g., the monograph of Graver et al. (19931); 
there, a framework is called generically rigid if A is reconstructible from the measurements up to 
finite choice. A well-studied question introduced by [Thorpe ( 1983| ) relates to the rigidity prop- 
erties of frameworks with the topology of an Erdos-Renyi random graph G„ p. The only known 
(non-trivial) asymptotic results concern point dimensions r < 2: 

Theorem 1.3. The following statements hold: 

1. Let A e S(2, n) he a distance matrix. Then, A can be reconstructed from the observed entries 



(i.e., is generically rigid) if and only ifp > n ^(logn-|-21oglogn+a)(l)) (Jackson et al. 
2007i . 



2. The threshold for Gj^p to contain any non-trivial rigid substructure is p = C2/n, for a constant 
C2 *^ 3.588; when a non-trivial rigid substructure emerges, it is linear-sized (Kasiviswanathan 



etal. 2011). 



The transition from rigid to flexible happens exactly at the threshold for the minimum degree 
of G„ p to reach 2, echoing the coupon collector's bound for low-rank matrix completion. The 



proofs of both parts of Theorem 1.3 are basically combinatorial, and rely, in an essential way on 



Laman's Theorem (19701. Asymptotic results for r > 3 are not known. Our framework permits 



the derivation of such results that generalize Theorem 1.3 



1.4. Contributions 



Coherence and reconstruction In Section [2j we develop the theory of algebraic compressed 
sensing. Our main result elates the coherence of a a variety X to the sampling density of a generic 
(constrained) signal x sX, which is needed to achieve reconstruction of x, up to finite choice. 
We show: 

Theorem 1.4. Let X c K" be an irreducible algebraic variety, let Q be the projection onto a set of 
coordinates, chosen independently with probability p, let x ^ X be a generic point on X. There is an 
absolute constant C such that if 

p >C-A-coh(X)-logn, with A > 1, 

then X is reconstructible from - i.e., r2~^(r2(x)) is finite - with probability at least 1 — 3n~^. 

Here generic can be taken to mean that either x is sampled from a continuous probability 
density on X or taken from a certain Zariski dense subset of X. This generalizes Theorems 



1.2 and 1.3 in several directions: (1) it applies more broadly, since it requires only a bound 
on the coherence of X; (2) the set of generic points in X is dense and includes some that are 



not "incoherent" in the sense of Theorem 1.2; (3) it doesn't rely on the specific combinatorial 



structure of the problem in the way that Theorem 1.3 does 
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Low-rank matrices Our first application of Theorem 
which does not depend on properties of A 



1.4 



is to derive a variant of Theorem 



1.2 



Theorem 1.5. Let r e N be fixed, let A be a generic (m x n) matrix of rank at most r with m < n. 
Write f = max(r, logn)|^ There is a number C, depending only on r, such that if 

p > C ■ r ■ n~^logn, 

then w.h.p., A can be reconstructed from independent observations with probability p of its entries. 

Moreover, if there is a Hadamard matrix of order n, the same conclusion holds with p > C ■ r ■ 
n log n and C now an absolute constant. 

Proof The first statement follows from Corollary |4.7[ the second from Theorem |l.4| together with 
the coherence bounds from Propositions |2.5| and |4.3[ □ 

We also show that the same bounds hold when A is taken to be symmetric. 



Distance matrices Our second application is to distance matrices. The coherence of the Cayley- 
Menger variety is obtained by relating it to that of the determinantal variety. This is carried out 
in Section |5] using tools from Section [Sj We obtain: 

Theorem 1.6. Let r e N be fixed, let D be a generic distance matrix of points in r-space. There is a 
number C, depending only on r, such that if 

p>C -n-^Oognf 

then w.h.p., D can be reconstructed from independent observations with probability p of its entries. 
Proof This is an immediate consequence of Corollary [5.11 



□ 



In the language of ( Kasiviswanathan et al.| 2011| ), Theorem 1.6 says that with the stated p, 
the random graph ^ is generically rigid w.h.p. Because the minimum degree of a graph that is 
genetically rigid in dimension d must be at least d, the order of the lower bound on p cannot be 
improved by more than a factor of log n. 



2. Coherence and signal reconstruction 

In this section, we provide a framework for examining compressed sensing under algebraic con- 
straints. First we introduce some formal concepts which describe the setting of compressed 
sensing under algebraic constraints, in particular the sampling process which we will assume to 
randomly and independently sample coordinate projections of the signal without repetition. 

Definition 2.1. Let X c K" be an analytic variety. Fix coordinates (Xj, . . . ,X^) for K". Let S(p) 
be a the Bernouilli random experiment yielding a random subset of {X-^, . . . jX^} where each X^ is 
contained in S(p) independently with probability p. We will call the projection map : X — > y 
defined by (x^, . . . , x„) >->(..., X;, .. . : X; e S(p)) of X onto the coordinates in S(p), which is an 
analytic-map-valued random variable, a random masking ofX with selection probability p. 

*Since r is fixed, for large enough k, f = logn. 
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The constraints defined byX play a crucial role in determining the sufficient sampling density 
which allows reconstruction of the signal. Namely, the central property of X which will determine 
the sufficient density is the so-called coherence, which describes the degree of randomness of a 
generic tangent flat to X; intuitively, it can be interpreted as the infinitesimal randomness of a 
signal. 

Definition 2.2. Let H c K" be a fc-/Zal0 Let ? : IK" ^ H c K" the unitary projection operator onto 
H, let ei,...,e„ a fixed orthonormal basis o/K". Then the coherence of H with respect to the basis 
e^, . . . , e„ 15 defined as 

coh(H)= max ||y(ei) - y(0)|p. 

l<i<n 

When not stated otherwise, the basis e; will be the canonical basis of the ambient space. 

Remark 2.3. Let H c K" be a k-flat. Then the coherence coh(H) does not depend on whether we 
consider H as a k-flat in K", or as a k-flat in IC" 2 K" for m > n (assuming the chosen basis of 
K.™ contains the basis of K"). Moreover, if H Q M", the coherence of H equals that of the complex 
closure of H. 

The coherence of a fc-flat is bounded in both directions: 

Proposition 2.4. Let H be a k-flat in K". Then, ^ < coh(H) < 1, and the upper bound is tight. 

Proof Without loss of generality, we can assume that e H and therefore that "P is linear, since 
coherence, as defined in Definition 2^ is invariant under translation of H. 

First we show the upper bound. For that, note that for an unitary projection operator : 
and any x e K", one has ||y(x)|| < ||x||. Thus, by definition, 

coh(H) = max ll^CeJH^ < max ||ev||^ = 1. 

l<!<n l<!<n 

For tightness, take H as the span of e^, . . . , e^^. 

Let us now show the lower bound. We proceed by contradiction. Assume ||P(ei)|p < ^ for 

all i. This would imply fc = n • ^ > XjjLi = ll^llf = ^ which is a contradiction, where in 

the last equality we used the fact that orthonormal projections onto a fc-dimensional space have 
Frobenius norm k. □ 



A similar definition of coherence, as in Definition 2.2 was used by Candes and Recht (20091, 
with different normalization. The following Proposition states how close the coherence of flats 
comes to the lower bound: 

Proposition 2.5. Denote b(fc, n) = inf^cK" coh(H), where the infinum is taken over all k-flats H. 

(i) Assume there exists an (n x n) Hadamard matrix^ Then b{k, n) = K 

(ii) There is a number C, depending only on k, such that b{k,n) < C • ^, where we write fc„ = 

max(fc, logn). 



fc-flat is a linear subspace of dimension k which does not necessarily contain 0. Other names are affine subspace 
or affine linear variety. 

Hadamard matrix is a not necessarily symmetric square matrix with entries ±1 and mutually orthogonal rows. 
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Proof, (i) It suffices to provide a fc-flat H such that coh(H) = - with respect to ei,...,e„. By 
applying a unitary transform, we can assume that H = span(ei, ...,£;,) and we need to provide 
a unitary system of coordinate vectors v^, . . . , v„ such that coh(H) = ^ with respect to v^, . . . , v„. 
We claim that we can take (v^, . . . , v„) = ^M^, where M is an (n x n) Hadamard matrix. Indeed, 

V " 

it holds that coh(H) = = {^jny = 1- 



(ii) follows from Lemma 2.2 of |Candes and Recht| ( 120091 ). □ 

Definition 2.6. Let X c K" be an (real or complex) irreducible analytic variety of dimension d 
(ajfine or projective). Let x ^ X a smooth point, and let Tx^x be the tangent d-flat ofX at x. We 
define 

coh(x gX) := coh(r;f^). 

If it is clear from the context in which variety we consider x to be contained, we also write coh(x) = 
coh(x eX). Furthermore, we define the coherence ofX to be 

X = inf coh(x), 

xeSm{_X) 

where Sm(X) denotes the smooth locus ofX. 



Note that Remark 2.3 implies that the coherence coh(X) does not depend on the size of the 
ambient space. Also, if X is a fc-flat, then the definitions of coh(X), given by Definitions 2.2 and 



2.6 agree. These definitions, together with Proposition ^31 imply: 



Proposition 2.7. LetX c K" be an irreducible analytic variety. Then, ^ dimX < coh(X) < 1. 
Proof Let d = dimX. Irreducibility of X implies that, at each smooth point x e Sm(X), the 



tangent space T;;- is a d-flat in K". Both bounds then follow from Proposition 2.4 



□ 



For ease of notation, we also define the incoherence as one minus coherence: 

Definition 2.8. ForX c K" an irreducible analytic variety, and x eX, we define the incoherence 

incoh(x) = 1 — coh(x) and incoh(X) = 1 — coh(X). 

With these tools in place, we can prove our main result, which we recall from the introduc- 
tion. 

Theorem 1.4. Let X c K" be an irreducible algebraic variety, let Q be the projection onto a set of 
coordinates, chosen independently with probability p, let x ^ X be a generic point on X. There is an 
absolute constant C such that if 

p >C-A-coh(X)-logn, with A > 1, 

then X is reconstructible from r2(x) - i.e., Q~^(^Q(^x)) is finite - with probability at least 1 — 3n~^. 

Sketch of proof: The argument, which is in Appendix |b] integrates some ideas of Candes and 



Recht ( 2009| ) into our general algebraic setting. 

Remark 2.9. By the bounds given in Proposition 2.7, the best obtainable bound in Theorem 1.4 is 
pn>C-X - dim(X) • log n, with A > 1. 
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3. Coherence of subvarieties and secants 



In the following, we will prove some bounds relating the coherence of different varieties to each 
other. 

Lemma 3.1. Let H QK^bea k-flat, letX QHhea subvariety. Then, coh(X) < coh(H). 

Proof. We first prove the statement for the case where X is a flat; without loss of generality one 
can then assume that e X. Let "P' be the unitary projection onto X, similarly T the unitary 
projection onto H. Since X c H, it holds that WP'xW < ||Tx|| for any x e K". Thus, coh(X) < 
coh(H). 

The statement for the case where X is an irreducible variety follows from the statement for 
vector spaces. Namely, for x e X, it implies coh(x e X) < coh(H), since the tangent space of X 
at X is contained in H. By taking the infimum, we obtain the statement. □ 



Corollary 3.2. As in Proposition 2.5 denote n) = inf^cK" coh(H), where the infinum is taken 
over all k-flats H. 

(i) For any £ e N, it holds that b(^k, n) <b(^k + t,n + 1). 

(ii) IfPayley's conjecture on Hadamard matrice^is true, then 



fc + c(n) 

bik, n) < — — , where c(n) = 4 

n + c(nj 



n. 



Proof, (i) Let H be an generic (fc + £)-flat in K"^^, let Hq the n-flat spanned by the any n or 



3.1 



thonormal basis vectors of K""*"^. Then HnHg will be a generic fc-flat in Hq = K", and Lemma 
implies coh(H n Hq) < coh(H). Since any fc-flat in Hq can be obtained in this way, and the possi- 
ble H are dense in the set of all (fc + £)-flats in K""*"^, the statement follows, (ii) The statement 
follows from applying (i) to £ = c(n) and Payley's conjecture. □ 

Note that Corollary 3.2 (i) is not a contradiction to the lower bound in Proposition |23] since 
it always holds that - < ^ due to the fact that k<n. 

■' n n+t 

Lemma 3.3. LetX,Y c K" be a analytic varieties, letX + Y = { x + y ; x eX,y e Y} be the sum 
ofX and Y. Then, coh(X) < coh(X + Y). 

Proof Denote Z = X + 7 , let z e Z be an arbitrary smooth point. By definition, there are smooth 
e X,y(z) e Y such that z = + y(z). Let T^ be the tangent space to X + y at z, let T^ 
be the tangent space of X at x(z). An elementary calculation shows T^ c r^, thus coh(x(z)) < 



coh(z) by Lemma 3.1 Since z was arbitrary, we have coh(X) < inf2,gsin(z) coh(x(z)) < coh(Z). 

□ 

Remark 3.4. In general, it is false that coh(X + Y)< coh(X) + coh(y). Consider for example 
X = span((l, 1, 1)T) and Y = span((l, -1, -1)^). 



^Payley's conjecture states that there exists a (fi x n) Hadamard matrix for every 4k, fc g N 
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4. Low-rank matrix completion: the determinantal variety 

In this section. Theorem |1 .41 is applied to obtain sampling densities for low-rank matrix comple- 
tion in the symmetric and non-symmetric case. 

Definition 4.1. We will denote by !M(m x n, r) the set of {m x n) matrices in K of rank r or less, 
and by Mj^^Cn, r) the set of symmetric real resp. Hermitian complex (n x n) matrices of rank r or 
less, i.e., 

M(m X n, r) = {a e K'"''" ; rkA < r} , and 
J^symin, r) = {a e K"""" ; rkA < r,A^ = a} . 

Since the matrices in 'M.^yjjji, r) are symmetric resp. Hermitian, we will consider it as canonically 
embedded in ^n{n + l)-space. 

M(mxn, r) is called the determinantal variety of (mxn)-matrices of rank (at most) r, "M^y^^n, r) 
the determinantal variety of symmetric (n x nymatrices of rank (at most) r. 



As a corollary to Lemma |3.3[ we obtain 
Corollary 4.2. For any m, n, r, it holds that 

coh(M(m X n, rj) < coh(M(m x n, r 1)) and coh(Ms^^(n, r)) < coh(Ms^^(n, r + 1)) 
Proof This follows from the equalities M(m x n, r -I- 1) = M(m x n,r) + M(m x n, 1) and 



^sym("> r -I- 1) = JAsyynin, r) -I- J^^yjn{n, 1), and Lemma 



3.3 



□ 



The following main structural observation for determinantal varieties links the coherence of 
low-rank matrices to the coherence of the row and column spans. 

Proposition 4.3. Let A e K""^", let be the row span of A, and the column span of A Then, 
incoh(A e M(m x n, r)) = incoh(H„)-incoh(H^) and, if A is Hermetian, incoh ^A e Msymi^, = 
incoh(H„)^ = incoh(H^)^. 

Proof The calculation leading to ( Candes and Reclit} 2009 Equation 4.9) shows in both cases 
that coh(A) = coh(Hn) -I- coh(Hni) — coh(H„) coh(H^), from which the statement follows. □ 

Corollary 4.4. It holds that 

coh(M(m X n, r)) = 1 — sup incoh(H„)incoh(H^), and 
coh(Msym(") '")) = 1 ~ supincoh(H„)^, 

where the sup range over all r-flats in m-space and r-flats H„ in n-space. In particular, there 
are numbers C, C', depending only on r, such that 

coh(M(m X n, r)) < C(^mn)~^(^mr„ + nr^ — r^r^J and 
coh(M,^^(n, r)) < C'n-\2nf, - r^), 

where we write Tj^ = max(r, logfc). 
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Proof. The first statement follows from the fact that any pair of r -flats H„ and in m-resp. n- 
space, there exists an A e M(m x n, r) such that the row resp. columns span of A is resp. H„, 



The second statement follows from the bound in Proposition 2.4 □ 



Remark 4.5. Keep the notation of Proposition 2.5 and Corollary 3.2 If b(r, m) = — and b(r, n) = 



, then Corollary 4.4 together with Proposition imply 



r 1 

coh(M(m X n, r)) = (m + n - r) = — dim(M(m x n, r)). 

mn mn 

Corollary 4.6. coh(M(n x n, r)) = coh(Msym(n, r)). 



Proof coh(M(n x n, r)) < coh{^Msymi.n, r)) follows from Proposition 4.3 by considering M(n x 
n, r) c "M^y^^n, r). For the converse, let A e M(n x n, r). It suffices to show that there is M e 
Msy^(n, r) with coh(M) < coh(A). Let Hi,H2 be row and column span of A, such that coh(Hi) < 



coh(H2). Choosing an M with column (and thus also row) span H yields, by Proposition 4.3 an 
M with coh(M) < coh(A). □ 

Corollary 4.7. Let Q be a random masking of M(m x n, r), or MsymC". ^) which case we set 
m = n), with sampling probability p . Let A be a generic matrix in M(m x n, r), or 'M^y^^^n, r). Write 
Yj^ = max(r, logfc). Then there is an number C, depending only on r, such that if 

p>C-X- {mnY^ ■ {_mr^ + nf^ — r^r„) log(mn), with A > 1, 

then n~^(n(A)) is finite with probability at least 1 - 3(mn)"''-. 

5. Distance matrix completion: the Cayley-Menger variety 

In this section, we will prove a bound on the coherence of the Cayley-Menger variety, i.e., the set 
of Euclidean distance matrices. The proof strategy is linking it to the case of symmetric low-rank 
matrices. We first introduce notation for the various occurring manifolds. 

Definition 5.1. Assume r < m < n. We will denote by C(n, r) the set of {n x n) real Euclidean 
distance matrices of points in r -space, i.e., 

e(n, r) = {d e K"""" ; Dj^- = (x; - Xjfixi - Xj)for some x^, . . . , x„ e K'} . 

Since the the elements o/C(n, r) are symmetric, and have zero diagonals, we will consider C(n, r) as 
canonically embedded in -space. 

S(n, r) 15 called the Cayley-Menger variety of n points in r-space. 

We will now continue with introducing maps related to the above sets: 

Definition 5.2. We define canonical surjections 

if : (K'')" ^ e(n, r); (xi, ...,xJ^Ds.t. D^j = (x; - Xj^^x^ - Xj), 

(j) : (K'')" ^ M,3,^(n, r); (x^, . . . , xJ ^ As.t. Aj^- = x^x^-. 

Note that ip, cj) depend on r and n, but are not explicitly written as parameters in order to keep 
notation simple. Which map is referred to will be clear from the format of the argument. 
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We now define a "normalized version" of M^y^^n, r): 



Definition 5.3. Denote by = {x e K'"+i ; x^x = 1}. Then, define M^^^^Cn.r) := ((§'")") 



Since M*^^(n, r) contains only symmetric matrices with diagonal entries one, we will consider it as 



a subset of {^)-space. 

Remark 5.4. The maps if, cj) are algebraic maps, and the sets 

C(n, r), M(m x n, r), Ms^^(n, r), ^'^y^i.n, r) are irreducible algebraic varietie^ 

Lemma 5.5. For arbitrary n, r, one has coil (M,y^{n, r)) = coli (0 (IC)") . 



Proof If IK = C, tlien <p is surjective, so tlie statement follows. If IK = M, note that the coherence 
of a general matrix does not depend on the variety it is considered in, since dimM5^^(n, r) = 



dim^ClC)". Take M e Msy;n(n, r). Then, take any matrix A e M"^'" whose rows are a basis 



for the row span of M. Then, A4 ' e ^ (M'") , and by Proposition |4^ coh(M) = coh(A4^). The 



statement follows from this. □ 

The dimensions of the above varieties are classically known: 

Proposition 5.6. One has dimC(n, r) = dimM^^^Cn, r + 1) = r ■ n — C^^)' dimensions 
are the same for the complex closures. 

Central in the proof will be the following map: 

Definition 5.7. For h e IK, we will denote by 

1 

: ^ ; X ^ (x, h) 

\Jx^x + h^ 

the map which considers a point W as a point in the hyperplane {(x,h) ; x e W} c M''+^ and 
projects it onto S'^. (ifK = C,wefix any branch of the square root) 

Proposition 5.8. For any n, r, it holds one has coh(C(n, r)) < coh(M* „(n, r + 1)). 

sym 



Proof Lemma 5.10 implies that 

coh(c/.((IKT)) <coh((^((K'-r)), 
the claim then follows from f ((IK' )") c g(n, r) and Lemma 



5.5 



□ 



We can bound the coherence of Mt„„(n, r) as follows: 

Proposition 5.9. There is a number C, depending only on r, such that 
coh(M*^^(n, r)) < C^, where we write r„ = max(r, logn). 



^irreducibility for e(n, r), M^y^^n, r), M*^^(n, r) follows from irreducibility of the respective ranges of the complex 
closure of and surjectivity, irreducibility of M(m x n, r) can be shown in a similar way; note that the real maps 
are in general not surjective 



11 



Proof. Lemma 2.2 of Candes and Recht (20091 proves that, for any fixed set of singular values, 
there exists a matrix M e M(n x n, r) with coh(M) < Cn'^Tn such that M has these singular 
values. By taking the singular values of M to be all one, and replacing M writh a symmetric 
matrix M' havin g the same row^ or column span as M, as in the proof of Corollary |4.6[ we see by 
Proposition 4.3 that coh (m' g (n, r)) < coh(M). □ 



Our stated bounds on the number of samples required for distance matrix reconstruction then 
follow from the following lemma, which is proved in Appendix [C} 

Lemma 5.10. Let x^, . . . , x„ e M'". Let D = (^(x^, . ..,x^) andA = 0(v;,(xi), . . ., V;i(x„)), let Tj;), 
the respective tangent flats. Then, for h — > oo, we have convergence — > T^,, where we consider the 
tangent flats as points on the real Grassmann manifold of (^r - n — (^~^^)^-flats in {"~^^)-space. 

Corollary 5.11. Let Q be a random masking of C(n, r), with sampling probability p. Let D be a 
generic distance matrix. Then there is an number C, depending only on r, such that if 

p > Cn-\lognf 

then r2~^(r2(D)) is finite with probability at least 1 — 3n~^. 



Proof This follows from Theorem 1.4 and the coherence bounds from Propositions 5.8 and 5.9 



□ 
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A. Finiteness of Random Projections 



The theorem, which will be proved in this section and which is probably folklore, states that for 
a general system of coordinates, a number of dim(X) observation is sufficient for identifiability. 

Theorem A.I. LetX c K" be an algebraic variety or a compact analytic variety, let : K" — > K'" 
a generic linear map. Let x ^ X be a smooth point. Then, X n is finite if and only if 

k > dim(XX andXnn~'^[n[x)') = {x} ifm> dim(X). 

Proof The theorem follows from the the more general height-theorem-like statement that 

codim(X n H) = codim(X) -I- codim(H) = codim(X) + n-k, 
where H is a generic ?c-flat, a proof of which can be found for example in the Appendix of ((Kiraly 



et al. 201 2bj ). Then, the first statement about generic finiteness follows by taking a generic 



y e 0(X) and observing that r2~^(y) = H nX where H is generic if ?c < dim(X). That implies 
in particular that if fc = dim(X), then the fiber r2~^(r2(x)) for a generic x eX consists of finitely 
many points, which can be separated by an additional generic projection, thus the statement 
follows. □ 



Theorem A.l can be interpreted in two ways. On one hand, it means that any point on X can 
be reconstructed from exactly dim(X) random linear projections. On the other hand, it means 
that if the chosen coordinate system in which X lives is random, then dim(X) measurements 
suffice for (finite) identifiability of the map - no more structural information is needed. In view 



of Theorem 1.4 this implies that the log-factor and the probabilistic phenomena in identifiability 
occur when the chosen coordinate system is degenerate with respect to the variety X in the sense 
that it is intrinsically aligned. 

B. Analytic reconstruction bounds and concentration inequalities 

This appendix collects some analytic criteria and bounds which are used in the proof of Theo- 
rem 1.4 The first lemma relates local injectivity to generic finiteness and contractivity of a linear 



map. It is related to (Candes and Recht 2009 Corollary 4.3 ). 



Lemma B.l. Let ip : X ^ Y be a surjective map of complex algebraic varieties, let x e X, and 
y = be smooth points ofX resp. Y. Let 

dip : T^X TyY 

be the induced map of tangent spacesH Then, the following are equivalent: 



(i) There is an complex open neighborhood U ^ x such that the restriction ip : U ^ ip(U) is 
bijective. 



'^T^X is the tangent plane oi X at x, which is identified with a vector space of formal differentials where x is 
interpreted at 0. Similarly, TyY is identified with the formal differentials around y. The linear map dip is induced 
by considering i^{x + dv) = y + dv' and setting d(p(dv) = dv'; one checks that this is a linear map since x,y are 
smooth. Furthermore, T^X and TyY can be endowed with the Euclidean norm and scalar product it inherits from the 
tangent planes. Thus, dip is also a linear map of normed vector spaces which is always bounded and continuous, but 
not necessarily proper 
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(ii) dip is bijective. 

(iii) There exists an invertible linear map : TyY — > T^-X. 

(iv) There exists a linear map 6 : TyY — > T^X such that the linear map 

6 odif — id, 

where id is the identity operator, is contractiv^^ 
If moreover X is irreducible, then the following is also equivalent: 

(v) (/'"^(y) is finite for generic y ^Y. 

Proof (ii) is equivalent to the fact that the matrix representing d (/? is an invertible matrix. Thus, 
by the properties of the matrix inverse, (ii) is equivalent to (iii), and (ii) is equivalent to (i) by 
the constant rank theorem (e.g., 9.6 in Rudin (1976)). 



By the upper semicontinuity theorem (1.8, Corollary 3 in Mumford ( 1999 )), (i) is equivalent 
to (v) in the special case that X is irreducible. 



(ii)=> (iv): Since dip is bijective, there exists a linear inverse 6 : TyY — > T^X such that 
6 o dp = id . Thus 

0od(y9-id = O 
which is by definition a contractive linear map. 

(iv)=> (iii): We proceed by contradiction. Assume that no linear map 6 : TyY ^ T^X is 
invertible. Since p is surjective, dp also is, w^hich implies that for each 6, the linear map o dp 
is rank deficient. Thus, for every 6, there exists a non-zero a e Ker 9. By linearity and surjectivity 
of dQ, there exists a non-zero p e T^X writh dr2(/3) = a. Without loss of generality we can assume 
that 11/3 II = 1, else we multiply a and (3 by the same constant factor By construction, 

||[0od(/.-id](^)|| = ||e(a)-^|| = ||^|| = l, 

so 9 cannot be contractive. Since 9 was arbitrary, this proves that (iv) cannot hold if (iii) does 
not hold, which is equivalent to the claim. □ 



The second lemma is a consequence of Rudelson's Lemma, see Rudelson ( 1999 1, for BernoulH 
samples. 

Lemma B.2. Let yi,..., y^ be vectors in R", let ei, . . . ,eyi be i. i.d. Bernoulli variables, taking value 
1 with probability p and with probability (1 — p). Then, 



E 



< c 



logn 

max llyjl 

p l<i<M 



with an absolute constant C, provided the right hand side is 1 or smaller 



°A linear operator A is contractive if < 1 for all x with ||x|| < 1. 
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Proof. The statement is exactly Theorem 3.1 in Candes and Romberg] (2007 1, up to a renaming 
of variables, the proof can also be found there. It can also be directly obtained from Rudelson's 



original formulation in Rudelson (19991 by substituting -hji in the above formulation for j; in 

I I 1 I vP 

Rudelson's formulation and upper bounding the right hand side in Rudelson's estimate. □ 

Now we proceed to the proof of the main theorem: 

Theorem 1.4. Let X c K" be an irreducible algebraic variety, let Q be the projection onto a set of 
coordinates, chosen independently with probability p, let x ^ X be a generic point on X. There is an 
absolute constant C such that if 

p > C ■ X - coh(X) • log n, with A > 1, 

then X is reconstructible from D.{x) - i.e., n~^{n{x)) is finite - with probability at least 1 — Sn""^. 

Proof It suffices to prove the theorem for K = C; for IK = M, we obtain the statement by replacing 
X by its Zariski closure in C" and using the assumption that x e M"; observe that < 
#(f^-Hf^(x))nM"). 

By the definition of coherence, for every 5 > 0, there exists an x such that X is smooth at 
X, and coh(x) < (1 + 5)coh(X). Now let y = r2(x), we can assume by possible changing x that 
is also smooth at y. Let Ty,T^ be the respective tangent spaces at y and x. Note that 
y is a point-valued discrete random variable, and Ty is a flat-valued random variable. By the 
equivalence of the statements (iv) and (v) in Lemma B.l it suffices to show that the operator 

P =p-'^e odn-id 

is contractive, where 9 is projection, from Ty onto T^, with probability at least 1 — 3n~^ under 
the assumptions on p. Let Z = ||P||, and let e^, . . . , e„ be the orthonormal coordinate system for 
C", and T the projection onto T^. Then the projection 6 o dQ has, when we consider T^ to be 
embedded into C", the matrix representation 



i=l 



where are independent Bernoulli random variables with probability p for 1 and (1 — p) for 0. 
Thus, in matrix representation. 



By Rudelson's Lemma B.2 it follows that 



E(Z)<Cy^max||y(e;)ll 

for an absolute constant C provided the right hand side is smaller than 1 . The latter is true if and 
only if 

p > C~^lognmax\\y(_ei)f. 
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Now let U be an open neighborhood of x such that coh(z) < (1 + 5)coh(X) for all z ^U. Then, 
one can write 



Z = sup 



J](^-i)-(yi,ne;)){y2,nei)) 

with a countable subset U' c u. By construction of U', one has 



(yi, ned) {72, net)) <p-\l + 5) coh(X). 
Applying Talagrand's Inequality in the form ( [Candes and Recht j 2009[ Theorem 9.1), one obtains 

P(||Z - E(Z)|| > t) < 3 exp (-^ log (l + ) 
with an absolute constant K and B =p~^(l + 5)coh(X). Since 5 was arbitrary, it follows that 



P(||Z -E(Z)|| > t) < 3exp - 



Substituting p = C ■ ?i' ■ coh(X) • log n, and proceeding as in the proof of Theorem 4.2 in (Candes 
and Rechtj [2009] ) (while changing absolute constants), one arrives at the statement. □ 



C. Tangent space of the Cayley-Menger variety 

In this appendix, we prove: 

Lemma C.l. Let x^, . . e M''. Let D = ^p{_Xi, . . . andA= (p^Vf^i^Xi), . ..,Vf^(x^)), let T^,, 
the respective tangent flats. Then, for h — > oo, we have convergence — > T^,, where we consider the 
tangent flats as points on the real Grassmann manifold of {r ■ n — {^^^^yflats in {^^^-space. 

Proof Note that 

Dij = xjxi - 2xjxj + xjxj. 



xJxj + h^ 



^xJxi+h^^x'^Xj + h^' 

An explicit calculation shows: 
f dD^ 



dA 

dxi. 



-(.Ski + 5kj) 



{xjx^ + h^) sj - X,. ^x-Jx,+h^^x]x^ + h^ 



{xjx^+h^) [x]xj + h^') 



where 5;, is the usual Kronecker delta. Thus, 



Hm h^ 

h—>oo 



dA 



dxi 



kJij 



dD 



2 \ dx. 



kJij 



which implies that both converges to in the Grassmann manifold when taking the limit 
h — > oo; the statement directly follows. □ 
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